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DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL 
SELECTION VIA CONVEX OPTIMIZATION 

By Steffen Lauritzen and Nicolai Meinshausen 
University of Oxford 

We want to congratulate the authors for a thought-provoking and very 
interesting paper. Sparse modeling of the concentration matrix has enjoyed 
popularity in recent years. It has been framed as a computationally conve- 
nient convex £1 -constrained estimation problem in Yuan and Lin (2007) and 
can be applied readily to higher-dimensional problems. The authors argue — 
we think correctly — that the sparsity of the concentration matrix is for many 
applications more plausible after the effects of a few latent variables have 
been removed. The most attractive point about their method is surely that it 
is formulated as a convex optimization problem. Latent variable fitting and 
sparse graphical modeling of the conditional distribution of the observed 
variables can then be obtained through a single fitting procedure. 

Practical aspects. The method deserves wide adoption, but this will only 
be realistic if software is made available, for example, as an R-package. Not 
many users will go to the trouble of implementing the method on their own, 
so we will strongly urge the authors to do so. 

An imputation method. In the absence of readily available software, it is 
worth thinking whether the proposed fitting procedure can be approximated 
by methods involving known and well-tested computational techniques. The 
concentration matrix of observed and hidden variables is 

/ K K OH \ 
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where we have deviated from the notation in the paper by omitting the 
asterisk. The proposed estimator S n = Ko of Ko was defined as 

(1) (K ,L n ) = argmin Sii -i(S - L; E&) + A n ( 7 ||S||i + tr(L)) 

(2) such that S - L y 0, L >- 0, 
where T,q is the empirical covariance matrix of the observed variables. 
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An alternative would be to replace the nuclear-norm penalization with a 
fixed constraint k on the rank of the hidden variables, replacing problem (1) 
with 

(K ,L n ) = argmin 5 L -£(S - L;YIq) + A n ||5||i 

such that S — L y and L >- and rank(L) < k. 

This can be achieved by a missing- value formulation in combination with 
use of the EM algorithm, which also applies in a penalized likelihood setting 
[Green (1990)]. Let the hidden variables be of a fixed dimensionality n and 
assume for a moment these are observed so one would find the concentration 
matrix K of the joint distribution of the observed variables Xq and hidden 
variables Xh based on the complete data penalized likelihood as 

(4) mgmm K -\ogf K {Xo,X H ) + \\\K \\i, 

where fx is the joint density of (Xo,Xh)- This formulation is very similar 
to the missing- value problem treated in Stadler (2012), except for the fact 
that we only penalize the concentration matrix Ko of the observed variables, 
in analogy with the proposed latent-variable approach. The EM algorithm 
iteratively replaces the likelihood in (4) for t = 1, . . . ,T by its conditional 
expectation and thus finds K t+1 as 

(5) K t+1 = a vgmm K -E An {log fK(X ,X H )\X } + X\\Ko\\i. 

The iteration is guaranteed not to increase the negative marginal penalized 
likelihood at every stage and will, save for unidentifiability, converge to the 
minimizer in (3) for most starting values. Without loss of generality, one can 
fix the conditional concentration matrix Kh of the hidden variables to be the 
identity so that these are conditionally independent with variance 1, given 
the observed variables. Then —Koh is equal to the regression coefficients of 
the observed variables on the hidden variables. As starting value we have let 
—Kq H be equal to these with hidden variables determined by a principal 
component analysis. 

The expectation in (5) can be written as the log-likelihood of a Gaus- 
sian distribution with concentration matrix K and empirical covariance ma- 
trix W t , where 



„ o ~ o OI i 

f>t yn t I jyt yn f>t 



The sufficient statistics involving the missing data are thus "imputed" in W l . 
Each of the updates (5) can now be computed with the graphical lasso 
[Friedman, Hastie and Tibshirani (2008)]. 

We thought it would be interesting to compare the two methods on the 
data example given in the paper. Figure 1 shows the solution Ko for the 
stock-return example when using the proposed method (1) and the imputa- 
tion method (4) with 4 iterations. The number k of latent variables and the 
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Fig. 1. The nonzero entries of the concentration matrix Ko, using the pro 
dure (1) (left) and the imputation method in (4) (right). Two representative 
are shown for some of the sectors. 
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number of nonzero edges in Ko is adjusted to be the same as in the original 
estimator. 

The three pairs with the highest absolute entries in the fitted conditional 
concentration matrix are identical (AT&T — Verizon, Schlumberger — Baker 
Hughes and Merrill Lynch — Morgan Stanley) for the two methods and the 15 
pairs with highest absolute entries in the off-diagonal concentration matrix 
have an overlap of size 12. The resulting graphs are slightly different although 
they share many features. Our graph has 136 edges, one more than that in 
the procedure described in the paper, and 77 of the edges are shared. Our 
graph has more isolated vertices (15 vs. 9), slightly fewer cliques (62 vs. 81) 
and the largest clique in our graph has six variables rather than four. The 
graph is displayed to the left in Figure 2 and features some clearly identified 
clusters of variables. 

The selected graph is very unstable under bootstrap simulations. In the 
spirit of Meinshausen and Buhlmann (2010), we fit the graph on 2000 boot- 
strap samples. Only 28 edges are selected in more than half of these samples. 
The resulting graph is shown in Figure 2. As many as 25 of these edges ap- 
pear also as edges of the estimator proposed in (1). It would have been 
interesting to be able to compare with the same "stability graph" of the 
proposed procedure but we suspect that they will match closely. 



Latent directed structures. In a sense the procedure described in this 
paper can be seen as a modification of, or an alternative to, factor analysis, 




Fig. 2. Left: the graph of the imputation method as in (4). Right: the graph of the stable edges. In both cases, isolated vertices have been 
removed from the display. 
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in which independent latent variables are sought to explain all the correla- 
tions, corresponding to the graph for the observed variables being completely 
empty. 

Methods for identifying such models can, for example, be developed using 
tetrad constraints [Spirtes, Glymour and Schemes (1993), Drton, Sturmfels 
and Sullivant (2007)]. Another generalization of factor analysis is to look for 
sparse directed graphical models, which have now been rather well establish- 
ed through, for example, the FCI algorithm [Spirtes, Glymour and Scheines 
(1993), Richardson and Spirtes (2002)] with an algebraic underpinning in 
Sullivant (2008). Again this could be an alternative to the procedure de- 
scribed in this interesting paper. 

Summary. We effectively replaced the nuclear norm penalization of L in 
the paper by a fixed constraint on the rank. This might be easier to do than 
choosing a reasonable value for the penalty on the trace of L. Using this 
formulation, we could combine the EM algorithm with the graphical lasso, 
enabling us to compute the solution with readily available software. It would 
be interesting to see whether our procedure can be shown to recover the 
correct sparsity structure under similar assumptions to those in the paper. 
We want to congratulate the authors again for a very interesting discussion 
paper. 
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